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Human brain anatomy is strikingly diverse and liighly 
inlieritable: genetic factors may explain up to 80% of 
its variability. Prior studies have tried to detect ge- 
netic variants with a large effect on neuroanatomical 
diversity, but those currently identfied account for 
<5% of the variance. Here we show, based on our 
analyses of neuroimaging and whole-genome geno- 
typing data from 1,765 subjects, that up to 54% of 
this heritability is captured by large numbers of 
single nucleotide polymorphisms of small effect 
spread throughout the genome, especially within 
genes and close regulatory regions. The genetic 
bases of neuroanatomical diversity appear to be rel- 
atively independent of those of body size (height), 
but shared with those of verbal intelligence scores. 
The study of this genomic architecture should help 
us better understand brain evolution and disease. 

Keywords: Neuroanatomy, Heritability, Neuroimaging, Genet- 
ics 



INTRODUCTION 

Family studies show that a large part of the variability of differ- 
ent human brain structures is determined by genetic factors. 
Because we know a priori the degree of genetic relationship 
between monozygotic and dizygotic twins, or between mem- 
bers of a family, we can decompose the variability of a pheno- 
type into genetic and environmental components. Various stud- 
ies have demonstrated in this way that neuroanatomical pheno- 
types, such as brain volume or cortical surface, are highly in- 
heritable, with genetic factors accounting for up to 80% of their 
variability (Winkler et al, 2010; Stein et al, 2012; Blokland, de 
Zubicaray, McMahon, & Wright, 2012). These results are par- 
ticularly important for psychiatric research. Different psychiat- 
ric disorders have been associated with characteristic changes 
in brain anatomy, such as a higher incidence of macrocephaly 
and increases of white matter volume in autism (Amaral, Schu- 
marm, & Nordahl, 2008), or reduced hippocampal and total 
brain volumes in schizophrenia (Steen et al., 2012). If these 
characteristic changes are modulated by the subject's genetic 
background, then this background may act as a protective 
factor or as a risk factor for the development of psychiatric 
conditions. 

Whereas family studies can infom us about the heritability of a 
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trait, different approaches are required to determine the nature 
of the genetic factors involved. Various efforts have been made 
to go deeper into the genetics of neuroanatomical diversity 
through candidate-gene approaches or through agnostic, gen- 
ome-wide association studies (Bis et al., 2012; Ikram et al., 
2012; Stein et al., 2012). These approaches have provided im- 
portant insights on the genetic bases of neuroanatomical di- 
versity, however, for the moment they account for only a small 
proportion of the phenotypic variance. 

Here we used a recently developed approach (Yang et al., 2010, 
2011), were the combined effect of hundreds of thousands 
SNPs is considered in a single test - instead of the massive uni- 
variate testing approach of classic genome-wide association 
studies (GWAS). We studied a large cohort of 1,765 adoles- 
cents from the IMAGEN project (Schumann et al., 2010), for 
whom neuroimaging, whole-genome genotyping and behavi- 
oural data was available. As in twin and family studies, we es- 
timated the amount of phenotypic variance explained by ge- 
netic relationships among subjects. By contrast, instead of us- 
ing expected relationships based on pedigree, we used a gen- 
ome-wide average of the difference in genotyping at each 
single nucleotide polymorphism (SNP) between unrelated sub- 
jects. By using different sets of SNPs to compute genetic rela- 
tionships, we were able to partition neuroanatomical variance 
into different SNP sets and investigate the genomic architecture 
of neuroanatomical diversity at a level of granularity intermedi- 
ate between that of family studies and candidate-gene or gen- 
ome-wide association studies. Finally, we used simulated phen- 
otypes to estimate the minimum number of causal SNPs likely 
to produce our observed results. 

RESULTS AND DISCUSSION 

Brain scans were obtained from a cohort of 2,089 adolescents 
(14.5±0.4 years old, 51% females) from the IMAGEN project 
( http://imagen-europe.com ) using magnetic resonance imaging 
in 8 European centres. We measured intracranial volume (ICV), 
total brain volume (BV), as well as the volume of the hippo- 
campus (Hip), thalamus (Th), caudate nucleus (Ca), putamen 
(Pu), globus pallidus (Pa), amygdala (Amy) and nucleus ac- 
cumbens (Acc) using validated automatic segmentation pro- 
grams (Buckner et al., 2004; Cox, 1996; Jenkinson, Bannister, 
Brady, & Smith, 2002; Patenaude, Smith, Kennedy, & Jenkin- 
son, 2011; Smith et al, 2004; Zhang, Brady, & Smith, 2001) 
(Figure la, Supplementary Figure 1, Supplementary Table 1). 
Individuals were whole-genome genotyped using lUumina 610- 
Quad and lUumina 660W-Quad chips. After various quality 
control filters, we conserved 269,308 informative, relatively in- 
dependent (R^<0.9) SNPs in a cohort of 1,765 unrelated sub- 
jects. 

First, we estimated the proportion of the phenotypic variance 
explained by all SNPs with a linear mixed-effects model with 
the genetic relationship matrix as the structure of the covari- 
ance between subjects using GCTA (Yang et al., 2010). We es- 
timated through simulation that we had >50% statistical power 
to find heritability values >45%, and >70% statistical power to 
find heritability values >55% (see Methods section). In all our 
analyses we included age, sex, and scanning centre as fixed ef- 
fects. To account for population structure effects, we also in- 
cluded the first 10 principal components (PCs) of the iden- 
tity-by-state (IBS) matrix as covariates (A. L. Price et al., 
2006). Figure lb shows the estimated proportion of the pheno- 
typic variance attributable to genetic variance (Vc/Vp) for the 
neuroanatomical structures under study (Supplementary Table 
2). The figure includes also estimates of Vc/Vp for height, as 
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Figure 1. a. Brain phenotypes. We measured intracranial volume (not represented), total brain volume (BV, in light grey) and several subcortical 
structures, Ca: Caudate nucleus, Acc: Nucleus accumbens, Pu: Putamen, Pa: Pallidum, Amy: Amygdala, Hip: Hippocampus, and Th: Thalamus, b. 
Variance captured by SNPs. Percentage of phenotypic variance ( Vp) due to inter-individual genetic relationships ( Kc), computed from all genotyped 
SNPs. In addition to brain phenotypes, the bar plot includes estimates of VcJVp for height, VIQ and PIQ. c. Effect of covarying body size (hcigiit) 
from brain phenotypes. The proportion of Vc/Vp after covaiying height (red bars) did not change substantially compared with those in lb (green 
bars), and maintained their statistical significance, d. Effect of covarying VIQ from brain plienotypes. The proportion of Vc/Vp after covarying VIQ 
(red bars) decreased especially for ICV and BV, where the estimates were no longer statistically significant (green bars: raw estimates from lb), e. 
Enricliment of variance captured by genie SNPs. Genie SNPs (gene boundaries ±50kbp) represent 64% of all SNPs. If all SNPs explained a similar 
amount of variance, genie SNPs should explain 64% of the total variance explained by SNPs (dashed line, green bars). They explained significantly 
more variance than expected for ICV, BV, Th and Pa; significantly less for Amy (red bars, error bars represent test variance). * P<0.05, ** P<0.01, 
unconnected. 



well as measurements of verbal intellectual quotient (VIQ) and 
performance intellectual quotient (PIQ) based on the Wechsler 
Intelligence Scale for Children. Our estimates for height 
(Fc/Fp=56%, P=0.0069), VIQ (Fc/Fp=56%, P=0.013) and PIQ 
(Kc/Fp=52%, P=0.02) were statistically significant, and consist- 
ent with those obtained previously in larger populations (Dav- 
ies et al., 2011; Yang et al, 2010). A total of 12 statistical tests 
were performed. Because of the correlation among phenotypes 
a simple Bonferroni correction would be too conservative. In- 
deed, a global test shows that there is a statistically significant 
(P=0.001I) excess of P-values <0.05 (Methods section). 



We found that a large proportion of the variance of neuroana- 
tomical phenotypes was explained by the additive effect of gen- 
otyped SNPs — for example, 44% (P=0.031) of the variance in 
total brain volume (BV), 53% (P=0.01) of the variance in hip- 
pocampal volume (Hip) and 54% (P=0.011) of the variance in 
intracranial volume (ICV). Combined, the largest genome- wide 
association studies to date for BV (Stein et al., 2012), Hip (Bis 
et al, 2012; Stein et al., 2012) and ICV (Ikram et al., 2012; 
Stein et al., 2012) (N~20,000) found 1 SNP associated to Hip 
volume and another associated to ICV, each explaining < 0.5% 



2 



Downloaded from http://biorxiv.org/ on September 18, 2014 



3 o 10 + 




1 100 200 300 +00 500 600 700 BOO 900 IDOO 



Number of causal ijariams used to produce the simulated prterotsfpe 



2! 


_o 










Q 


rc 


5 




e: 








100 




qT 






o 

rH 


90 


i 

w 


□ 


80 


c 
o 




70 




(0 








50 


3 






E 


b 


50 












40 


rtio 




30 


o 






a. 




20 






a. 


iffet 


10 






D 



3S 



^1 7 
^1 E 

~Max(-logio(P)l 



10 1000 2000 3000 1000 5000 6000 7000 SOOO 9000 10000 

Number of causal variants used to produce the simulated pberiotype 

Figure 2. Distribution of smallest P-value in the GWAS witii simulated phenotypes as a function of the number of causal SNPs used to 
generate them. Simulated phenotypes were produced with a number of causal SNPs varying from 1 to 1,000 (a), and from 1 to 10,000 (b). The effect 
of causal SNPs were drawn from a nomial distribution, and the heritability of the simulated phenotypes was fixed at 50%. Ninety five percent of 
simulated phenotypes with <220 causal SNPs had a smallest P-value <10 By contrast, simulated phenotypes produced with >500 causal SNPs had 
most often a smallest P-value of the order of 10 ' or 10 \ The top plot in (a) and (b) shows the most frequent order of magnitude of the smallest P- 
value as a function of the number of causal SNPs. 



of the variance - of the same order of magnitude as for other 
quantitative traits, such as height. Approximately 50% of the 
additive genetic factors affecting neuroanatomical variability 
may be then supported by a large number of SNPs, each of 
small eiiect. 

To get an idea of the minimum number of SNPs likely to pro- 
duce our observed results we simulated 10,000 phenotypes 
with additive heritability of 50% produced by 1 to 1,000 causal 
SNPs and 10,000 phenotypes produced by 1 to 10,000 SNPs. 
Causal SNPs were randomly selected from among the original 
518k genotyped SNPs before filtering (i.e., their effect may 
be noticeable only through linkage disequilibrium), and their 
effect sizes drawn from a noraial distribution to obtain 50% 
heritability. We performed GWAS for all our phenotypes, and 
recorded the order of magnitude of the smallest P-value, which 
varied from 10"^ ' to 10"'* (Table 1). We then did the same for 
each of the 20,000 simulated phenotypes. Figure 2 shows the 
proportion of simulations with smallest P-value of the order of 
10"^, lO""", etc., as a function of the number of causal SNPs used. 
We observed that 95% of the phenotypes simulated with <220 
causal SNPs had P-values <10"*. By contrast, the order of the 
smallest P-value in the GWAS for ICV, for example, was 10"^ '", 
and 95% of simulations with smallest P-value <10"^ '' were pro- 
duced by <850 causal SNPs. Similarly, the order of the smallest 
P-value in the GWAS for BV was lO-*" *, and 95% of the simula- 
tions with P-values smaller than that were produced by <420 
causal SNPs. If the distribution of effect sizes of causal SNPs 



for ICV and BV were similar that used in our simulations, our 
phenotypes should likely be produced by hundreds of causal 
SNPs and possibly thousands of them. 

The variance estimates for different brain structures were het- 
erogeneous, and appeared to be differently related to height, 
VIQ and PIQ (Supplementary Table 2). For example, whereas 
the variance explained by SNPs was high and statistically sig- 
nificant for the hippocampus (Fg/F;>=53%, P=0.01), this was 
not the case for the caudate nucleus (Vg/Vi^\6%, P=0.25) — a 

Table 1. Order of magnitude of the smallest P-value in the 
GWAS for each phenotype. 



Phenotype -loglO(Smallest P-value) 

ICV 5^95 

BV 6.75 

mp 5.50 

Th 5.72 

Ca 5.91 

Pu 5.49 

Pa 5.67 

Amy 5.14 

Acc 5.60 

Height 5.94 

VIQ 5.32 

PIQ 5.49 
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Figure 3. Va/Vp versus gene set length. The amount of variance captured by SNPs increased with the number of SNPs used to compute genetic- 
relationship matrices. In most cases, this was only the case for genic SNPs (Ref.Seq.±50kpb). * P<0.01, ** P<0.001, *** P<0.0001, uncorrected. 



structure of comparable volume, geometry, and variability, that 
presents a similar correlation with ICV (rHip;icv=0.51, 
rca/icv=0.52) and body size (rHip/Hei8i,t=0.15, rca/Heigi,t=0.21). This 
shows that the estimates of Vc/Vp are not merely determined by 
the structure's volume or shape, and could rather reflect a vary- 
ing influence of genetic and environmental factors. Our vari- 
ance estimates were not significantly affected by population 
structure - not including the 10 first principal components of 
the IBS matrix changed on average the estimates of variance by 
less than 1% (P=0.93). The estimates of variance did not 
change significantly either if height or PIQ were covaried (Fig- 
ure Ic, Supplementary Table 2). By contrast, including VIQ 
scores as covariate decreased substantially Vg/Vp estimates for 
ICV and BV, but not for subcortical structures (Figure Id). For 
example, ICV has a moderate correlation with height and VIQ 
(in our cohort ricv/HeigiM=0.39 and ricv/viQ=0.18). The estimate of 
Vg/Vp for ICV was not significantly different if height was ad- 
ded as a covariate, however, it decreased from 54% to 32% (no 
longer statistically significant) if VIQ was included as a covari- 
ate. We performed bivariate analyses to estimate the genetic 
correlation between our phenotypes, i.e., the amount of genetic 
variance shared by each pair of phenotypes (Supplementary 
Table 3). In particular, these analyses showed indeed a strong 
genetic correlation between VIQ and ICV (rG=0.95, 
P=0.0047), and between VIQ and BV (rG=0.89, P=0.014), but 
a small, not statistically significant, genetic correlation between 
height and ICV (rG=0.20, P=0.25), and between height and BV 
(rG=0.23, P=0.24). Genetic correlation was also weak between 



PIQ and ICV (rG=0.02, P=0.48) and between PIQ and BV 
(rG=0.02, P=0.48). More than 90% of brain volume (BV) is 
constituted by the cerebral cortex and its cortico-cortical con- 
nections. Our results suggest that the genetic bases of ICV and 
BV diversity may be shared to a larger extent with those of 
VIQ than with those of PIQ or body size (height). 

A large proportion of the genetic variance captured by SNPs 
could be due to those located within genes and close regulatory 
regions. We obtained 20,022 gene boundaries from the UCSC 
Genome Browser hgl8 assembly. We made a first set with all 
SNPs within these boundaries, and two fitrther sets that in- 
cluded also SNPs 20kbp and 50kbp upstream and downstream 
from the 5' and 3' untranslated regions of each gene. Next, we 
computed genetic relationship matrices for those 3 SNP sets 
(±Okbp, ±20kbp and ±50kbp genic sets), and their comple- 
ments. Finally, for each of the 3 sets, we fitted the same linear 
mixed-effects model as before (including age, sex, centre and 
10 principal components), but using 2 genetic relationship 
matrices instead of 1: the genic matrix and its complementary 
nongenic matrix. Genic SNP sets explained up to 98% of the 
variance captured by all SNPs (Supplementary Table 4), which 
was in many cases significantly larger than what could be ex- 
pected from set length alone (Figure le. Supplementary Table 
5). For ICV, where 54% of the variance can be explained by all 
genotyped SNPs (N=273,926), using only SNPs within gene 
boundaries (N=l 08,339) explained 26% of the phenotypic vari- 
ance (P=0.054), and this proportion increased to 45% 
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Figure 4. Statistical power as a function of lieritability. Estimation 
of statistical power obtained tlirougli simulation of 10,000 phenotypes 
with different heritability values, supported by a different number of 
causal SNPs. We had >50% statistical power to find heritability values 
>45%, and >70% statistical power to find heritability values >55%. 

(P=0.0065) when the boundaries were expanded to ±20kbp 
(N=146,431), and to 49% (P=0.0058) when the boundaries 
were expanded to ±50kbp (N=l 74,334). The genie ±50kbp set 
contained 64% of all genotyped SNPs, but explained 91% of 
the variance of ICV attributable to SNPs, significantly more 
than what we would expect from its length alone (P=0.014). 

Previous reports have suggested that causal SNPs for height 
and IQ are relatively homogeneously distributed across the 
genome, and then, that increasing the number of SNPs used to 
create a genetic-relationship matrix increases proportionally the 
amount of phenotypic variance captured. We observed the same 
trend for our neuroanatomical phenotypes. We partitioned the 
genome into non-overlapping sets with different numbers of 
SNPs, and observed a strong correlation between set length and 
Vg/Vp (r=0.62 on average). The correlation was the same when 
only genie SNPs were selected (r=0.62), but smaller, and in 
most cases not statistically significant when only nongenic 
SNPs were selected (Figure 3). 

Finally, we partitioned Vc/Vp based on functional annotation 
(SNPs within genes involved in central nervous system func- 
tion (Lee et al., 2012; Raychaudhuri, Kom, & McCarroU, 
2010)), and relative minor-allele frequency. We did not observe 
statistically significant differences in the amount of variance 
explained by these different SNP sets compared with the ex- 
pectations based on their length (Supplementary Methods, Sup- 
plementary Tables 6, 7). 

In conclusion, our analyses indicate that a significant propor- 
tion of the heritability of neuroanatomical phenotypes may res- 
ult from the additive effect of hundreds of small-effect SNPs 
spread genome-wide. Such SNPs seemed to be largely inde- 
pendent from those related to body size (height) or reflecting 
population structure in our cohort. They were shared to a 
greater extent, however, with those associated with VIQ in the 
case of ICV and BV. An especially important role in determin- 
ing neuroanatomical diversity appeared to be played by SNPs 



within genes and close regulatory regions. 

Even if our variance estimates are large, they are still far from 
the estimates of additive genetic variance from pedigree stud- 
ies: -80% of the variance of various brain structures has been 
attributed to additive genetic factors (Stein et al., 2012). This 
difference may be due to a weak linkage disequilibrium 
between the genotyped SNPs and the real causative variants, to 
rare alleles with larger effect sizes or to common alleles with 
even smaller effect sizes. In any case, further progress will re- 
quire cohorts of maybe hundreds of thousands of individuals, 
underlining the necessity for international efforts such as the 
ENIGMA and CHARGE consortia. 

Recent studies have highlighted the importance of the additive 
effect of SNPs in detemining anatomical and cognitive di- 
versity in humans, but also their role in psychiatric disorders. In 
addition to the clear role of rare mutations in the susceptibility 
to psychiatric disorders (Cook Jr & Scherer, 2008), whole-gen- 
ome analyses of variance have shown that commonly geno- 
typed SNPs capture 23% of the risk to schizophrenia (Cross- 
Disorder Group of the Psychiatric Genomics Consortium, 
2013; Lee et al, 2012), 24% of the risk to Alzheimer's disease 
(Lee et al., 2013), and from 17% to 60% of the risk to autism 
spectrum disorders (Cross-Disorder Group of the Psychiatric 
Genomics Consortium, 2013; Klei et al., 2012). Due to the 
small individual effect of these SNPs, GWAS will require very 
large cohorts to explain any sizeable proportion of the trait's 
genetic variance (Park et al., 2010). Various structural and 
functional neuroimaging endophenotypes, on the other hand, 
have been frequently associated with psychiatric disorders 
(Meyer-Lindenberg & Weinberger, 2006), and their analysis us- 
ing whole-genome regression could inform us about the added 
effect of SNPs at a relevant intermediate level, closer to biolo- 
gical processes than cognitive or psychiatric tests. A global 
view of the genomic architecture of neuroimaging endopheno- 
types should not only allow us to better understand the biolo- 
gical bases of the susceptibility to psychiatric disorders - help- 
ing us, for example, to target future GWAS to more specific 
chromosomal regions and brain structures - but also to improve 
our understanding of the biological bases of brain diversity and 
evolution in humans. 

METHODS 

Neuroimaging 

Magnetic resonance imaging data were acquired at 8 European 
centres, using a standardised 3 Tesla, Tl -weighted gradient 
echo protocol (voxel size=l.l mm isotropic) based on that from 
the ADNI initiative 

( http://adni.loni.ucla.edu/methods/documents/mri-protocols '). 
MRI volumes were first linearly transformed to match the 
MNI152 atlas using FLIRT from FSL (Jenkinson et al, 2002; 
Smith et al, 2004) 

( http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT ). The inverse of the 
determinant of the transformation matrix was used to estimate 
intracranial volume (Buckner et al., 2004). Next, skull was 
stripped using 3dSkullStrip from AFNI (Cox, 1996) 
( http://afni.nimh.nih.gov '). and the grey matter, white matter 
and cerebrospinal fluid were automatically segmented using 
FAST (Zhang et al., 2001) 

( http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST ). The skull-stripped 
versions of the brain volumes, and the tissue segmentations 
were visually inspected and manually connected wherever ne- 
cessary. Total brain volume was estimated as the sum of total 
grey and white matter volumes. Finally, subcortical structures 
were automatically segmented using FIRST (Patenaude et al., 



5 



Downloaded from http://biorxiv.org/ on September 18, 2014 



2011) ( http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST ), and their ac- 
curacy visually controlled using in-house software. All volumes 
were logio converted. Their distribution is illustrated in Fig. SI. 
Despite the differences in average volume (from ~1 cm^^ for the 
amygdala, to -1,300 cm^ for total brain volume, Fig. SI), all 
structures showed a similar variability - there was a ~1. 8-fold 
change from the smallest to the largest volume in the cohort. 
The correlation matrix for all phenotypes analysed is shown in 
Table SI. 

Genotyping and quality control 

We used the autosomal SNPs common to the Illumina 610- 
Quad and Illumina 660W-Quad chips, and strict filtering to 
conserve high-quality SNPs only (minor-allele frequency >5%, 
genotyping rate >99%, significance threshold for Hardy- Wein- 
berg equilibrium test <10"'', subjects missing genotyping <10%, 
using PLINK (Purcell et al., 2007), 
http://pngu.mgh.harvard.edu/~purcell/plink ). We further ex- 
cluded SNPs in strong linkage disequilibrium (R^>0.9) within a 
window of 50 SNPs to prevent colinearity in our analyses. The 
final genotyping data consisted of 269,308 SNPs. 

Estimation of variance captured by SNPs 
The variance of a phenotype attributable to genetic factors is 
classically estimated by comparing the correlation between 
pairs of monozygotic (MZ) and dizygotic (DZ) twins. Pairs of 
MZ and DZ twins share a common maternal environment dur- 
ing foetal life, but MZ twins share 100% of their genomes 
whereas DZ twins share on average 50% of it. If the variance 
of the phenotype is affected by genetic factors, the correlations 
between MZ twins will be larger than those observed in DZ 
twins. The amount of variance due to genetic factors can be 
then estimated using Falconer's formula (Falconer, 1965), or 
more accurately, by using the restricted maximum likelihood 
method (Corbeil & Searle, 1976) (REML). The twin study 
design can be extended to the analysis of more complex family 
relationships, by using the levels of relationship expected from 
the pedigrees. In neuroscience, this approach has been success- 
fully used to show that genetic factors explain an important part 
of the variance of several neuroanatomical phenotypes such as 
brain volume (Stein et al., 2012), cortical surface extension 
(Winkler et al., 2010) or white matter microstructure (Ko- 
chunov et al., 2010). 

Twin and extended pedigrees studies provide important inforni- 
ation on the role of genetic factors in determining neuroana- 
tomical phenotypes, but complementary approaches are needed 
to investigate the nature of these genetic factors. In the recent 
years, several research groups have attempted to discover the 
genetic bases of the high heritability of neuroanatomical pheno- 
types by studying, for example, their association with SNPs in 
candidate genes (BDNF (Pezawas et al., 2004), microcephaly 
genes (Rimol et al., 2010), etc.) or through genome-wide asso- 
ciation studies (Bis et al., 2012; Ikram et al., 2012; Stein et al., 

2012) . Yet these findings explain today only a minute part of 
the phenotypic variability. 

The approach that we used here, implemented by the GCTA 
software (Yang et al., 2010, 2011) 
r http://www.complextraitgenomics.com/software/gcta ). estim- 
ates the variance captured by a large number of SNPs (mod- 
elled as random effects), providing information at a level inter- 
mediate between twin studies and association studies. Like 
twin and extended pedigree studies, GCTA estimates the part of 
the phenotypic variance due to the matrix of genetic relation- 
ships among subjects. But instead of using levels of genetic re- 
lationship expected a priori from a pedigree, these levels are 
computed from genotyping data (Lynch & Ritland, 1999; Rit- 



land, 1996; Yang et al., 2010). The relationship between pheno- 
type variance Var(y), the variance of the additive genetic effects 

2 2 

0„ , and the residual variance 0„ is formulated as fol- 
lows: 



Var[y) 



■Gol+lol 



where G is the genetic relationship matrix, containing the de- 
gree of genetic relationship, and I is the identity matrix. In 
GCTA the level of genetic relationship between each pair of in- 
dividuals j and k - the entries of the G matrix - is calculated as 
a weighted average across all SNPs: 



1 ^ (x,-2p,)(x,,-2p,) 



i=l 

N 



1 f x:-(lt2p,)^,-f2p; . 



where x,, equals 0, 1 or 2 depending on whether the genotype of 
the i-th SNP of the j-th subjects is AA, AB or BB; pi is the al- 
lele frequency of SNP /; and N is the number of SNPs con- 
sidered in the analysis (in our case, A'=269,308). The level of 
genetic relationship between two subjects is then a single value 
summarising how similar their genomes are. 

GCTA estimates variance components using the Average In- 
formation REML (Gilmour, Thompson, & CuUis, 1995) (AI- 
REML) method - a variant of the classic REML that provides a 
more efficient estimation even if the G matrix is large. The stat- 
istical significance of the genetic variance estimates (the P-val- 
ues reported here) were computed using a maximum likelihood 
ratio test (LRT) comparing the complete model which includes 
the genetic effect, to a partial model which excludes it. In 
REML analyses, LRT values are distributed as a 50% mixture 
of 0 and a Chi-square with degrees of freedom (df) equal to the 
number of genetic relationship matrices being tested. The LRT 
values and the corresponding degrees of freedom for the tests 
are indicated in the LRT column of Supplementary Tables S2, 
S4, S6-7. 

Confounding factors 

Several confounding factors could affect our variance estim- 
ates. For all our analyses we included age, sex and scanning 
centre as covariates. Additional analyses including Pubertal 
Development Scale scores (Carskadon & Acebo, 1993) did not 
affect the results and this covariate was no longer included in 
the model. Population structure might also affect neuroanatom- 
ical diversity or bias our variance estimates: 

1 . Our estimates could be affected by ciyptic relatedness if our 
cohort included subjects distantly related. In that case, pheno- 
typic similarity could be partly due to shared environment ef- 
fects or familial causal variants not captured by SNPs. To 
prevent this, we excluded subjects with a genetic relationship 
> 0.025 (i.e., more related than 3rd or 4th cousins). 

2. We used Admixture (Alexander, Novembre, & Lange, 2009) 
( http://www.genetics.ucla.edu/software/admixture') to estim- 
ate individual ancestry relative to the reference populations in 
HapMap 3 ("The International HapMap Project.," 2003) 
( http://hapmap.ncbi.nlm.nih.gov ). The result (Fig. S2) 
showed that individuals in our cohort have a strong European 
ancestry component. 
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3. It has been shown that including as covariates the first prin- 
cipal components of the matrix of identity-by-state (IBS) dis- 
tance between subjects efficiently accounts for population 
structure effects (A. L. Price et al., 2006; A. Price, Zaitlen, 
Reich, & Patterson, 2010). Usually, the first 4 or 5 principal 
components are included. Here, we included the first 10 prin- 
cipal components in all our analyses. We observed, however, 
that not including them affected only marginally our variance 
estimates. On average, the difference between estimates in- 
cluding and excluding the first 10 principal components of 
the IBS matrix was of 0.6%, not significantly different (2- 
tailed t-test, P=0.9346). 

4. Partitioning the variance explained by SNPs between genie 
and nongenic showed that in many cases the former ex- 
plained significantly more variance than the latter (see bellow 
for a description of the method). For example, SNPs within 
genes ± 50kbp explain 91% of the total variance of ICV cap- 
tured by SNPs. If our estimates were driven by population 
structure effects, we could expect an excess of Ancestry In- 
formative Markers (AIMs) within the genie SNP set. We ob- 
tained a list of 1 ,442 AIMs from Tian and collaborators (Tian 
et al., 2008), 604 of which were contained in our SNP list. 
There was no statistically significant difference in the number 
of AIMs between our genie and nongenic SNP sets (375 
genie AIM versus 229 nongenic AIMs, Fisher's exact test 
P=0.I723), and if anything, there was a tendency for AIMs to 
be underrepresented within the genie SNP set (Fisher's exact 
test P=0.0892). 

These analyses suggest that population stratification effects did 
not play a major role in the determination of neuroanatomical 
variability in our cohort (Table S2). 

Estimation of statistical power 

We simulated 10,000 phenotypes with different heritability val- 
ues, supported by a different number of causal SNPs. We uni- 
foraily sampled heritability values in the range from 0 to 80%, 
and number of causal SNPs from 1 to 10,000. The causal vari- 
ants were selected from the non-pruned list of SNPs (~518k 
SNPs), but the genetic relationship matrices were computed us- 
ing only SNPs from the pruned set (~270k SNPs). In con- 
sequence, the effect of some of the causal variants would be 
only captured through their linkage disequilibrium with the 
SNPs retained in the pruned list. Statistical power achieved to 
detect a given heritability was estimated as the proportion of 
test with P<0.05 (Figure 4). 

Correlation between SNP set size and Vc/Vp 
We constructed genetic relationship matrices for 3 sets of non- 
overlapping, randomly selected, SNPs of small, medium and 
large size. These sets were drawn from all genotyped SNPs, or 
only from genie SNPs (Ref Seq.±50kbp), or nongenic SNPs. 
We ensured that small, medium and large sets contained the 
same number of SNPs in all 3 groups by selecting 20%, 30% 
and 50% of the total number of nongenic SNPs, the less numer- 
ous group. We performed 100 repetitions of this procedure, 
each time randomly selecting non-overlapping sets of 20%, 
30% and 50% (20%+30%+50%=100%) of SNPs from all gen- 
otyped SNPs, or only from the genie or nongenic subgroups. 
For each repetition, we computed the correlation between 
Vc/Vp and set size. Correlation coefficients were converted to Z 
values using Fisher's transfomation, and the distribution tested 
against the null-hypothesis of no correlation (2-tailed t-test). 
The amount of variance of ICV, BV, subcortical volumes, 
height, VIQ and PIQ explained by the low, medium and long 
sets correlated significantly with the size of the SNP set (Figure 
2). 



Partition of V(/Vp 

We partitioned Vc/Vp among non-overlapping sets of SNPs, for 
example, genie and nongenic SNPs (2 sets) or SNPs of low, 
medium and high minor-allele frequency (MAF, 3 sets), etc. We 
computed a genetic relationship matrix Gj for each of 
these n sets, and used them as random effects in our model. The 
variance of our phenotypes Vcil''{y) was therefore de- 
composed as 



i=l 



where the number of sets would be n=2 for the case of a genie 
versus nongenic partition, or n=3 in the case of a partition into 
low, medium and high MAF. 

The LRT columns in Supplementary Tables S4, S6-7 indicate 
the value of the likelihood ratio test comparing the complete 
model (including all genetic relationship matrices) to the partial 
model that includes only the residual variance component. The 
Pmodei columns indicate the statistical significance of these val- 
ues given the number of variance components tested (degrees 
of freedom). 

As a posteriori analyses, we tested whether the variance ex- 
plained by one of these sets, genie SNPs for example, was lar- 
ger than what could be expected given its number of SNPs. The 
total genetic variance explained is 



where is the total number of SNPs, and A', the number of 
SNPs in set i, i=l,...,n. If all SNP sets were equivalent, then the 
amount of variance they explain should be simply proportional 
to their length, and then 

where EV/ is the expected amount of variance explained by the 
i'-th set. We wanted to test whether the difference VrEVi was 
significantly larger than 0, so we constructed a Z-score 

V-EV, 



where 



Note that V,- here is the estimated explained variance for group i 
- a random variable; whereas EV/ is a fixed value. We com- 
pared the observed value of Z, with those obtained from 
> 10,000 random permutations, where n non-overlapping SNP 
sets of size were randomly sampled from all available SNPs 
(without replacement). 

Partition of Vc/Vp based on involvement in central nervous 
system function 
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We looked at the proportion of Vc/Vp that could be attributed to 
genes preferentially expressed in the central nervous system, 
playing a role in neuronal activity, learning, or involved in syn- 
aptic function. We used the set of 2,725 genes defined by 
Raychaudhuri and collaborators (Raychaudhuri et al., 2010) 
and previously used in the SNP-based heritability analyses of 
the susceptibility to schizophrenia by Lee and collaborators 
(Lee et al., 2012). We made 3 SNP sets: the 1st set, CNS+, con- 
tained all SNPs within ±50 kbp of the 5' and 3' UTR of the 
gene set (N=61,175, 23% of the total number of SNPs); the 2nd 
set, CNS-, contained all the remaining genie SNPs (N=l 13,160, 
42% of the total number of SNPs); and the 3rd set regrouped all 
nongenic SNPs. As before, the genetic-relationship matrices 
computed using these 3 SNP sets were used in a single linear 
mixed model. We found that the amount of variance explained 
by the CNS+ set was not significantly different than what we 
expected from its size (Table S6). 

Partition of Vc/Vp based on MAF 

Allele frequency variations may provide hints about the evolu- 
tionary history of a trait. We estimated the proportion of Vc/Vp 
that can be attributed to sets of SNPs with low (5-20%), me- 
dium (20-35%) and high (35-50%) minor allele frequencies. 
SNPs in the low-frequency set were the most numerous, 48% 
of all SNPs, followed by medium-frequency SNPs (30%), and 
high-frequency alleles (22%). Table S7 shows the result of fit- 
ting a linear mixed model with the 3 genetic-relationship 
matrices computed using the low, medium and high-frequency, 
in addition to the same fixed effects as previously. We could 
expect each set to explain a fraction of the variance coreespond- 
ing to the proportion of the total number of SNPs they repres- 
ent. Furthennore, because SNPs of high MAF are individually 
more informative than those with low minor-allele frequency, 
they could potentially explain more variance (the variance of 
the genetic-relationship matrices increased from the low to the 
medium to the high frequency set). However, the amount of 
variance explained by the different sets was not significantly 
larger than what we expected from their size. 

Global test on P-values 

We performed our analyses on 12 correlated phenotypes (9 
brain regions, plus Height, VIQ and PIQ). Because of these 
correlations, a standard Bonferroni correction would be too 
conservative. Indeed, after Bonferroni correction, just a few 
results would remain statistically significant. However, under 
the null hypothesis we should expect around 5% of these tests 
to be significant, but the observed number of P-values <0.05 
was much larger. To test the significance of this excess we con- 
structed a statistic S from the list of P-values converted to Z- 
values obtained for each phenotype: 

m 

i=l 

where m=12 is the number of tests performed and ISF stands 
for the inverse survival fiinction of the normal distribution. We 
then generated the distribution of S under the null hypothesis 
by drawing from a multivariate Gaussian distribution with a 
variance-covariance structure given by the correlation matrix 
across phenotypes (Table SI). The significance of the excess of 
P-values was estimated as the proportion of scores under the 
null hypothesis that were greater than the observed score. The 
result of this global test is indicated at the final row of supple- 
mentary tables S2, S4-6. 
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